Wellcome Open Research — Latest Matching Preprints

1

Using Bayesian Evidence Synthesis to estimate the number of sex workers in the United Kingdom

Long, H.; Gada, L.; Murray, L.; Laurence, T.; Hayward, A.; Finnie, T.

2026-05-26 public and global health 10.64898/2026.05.21.26353767 medRxiv

Top 0.1%

12.7%

Show abstract

Sex work is diverse and includes a broad range of people and settings. Over the last thirty years, a large proportion of public health emergencies of international concern (PHEIC) have involved infections transmitted through sexual or close contact and in sexual networks (WHO 2024). Sex workers can face increased disadvantage in relation to these public health emergencies. Given the significant health inequalities sex workers can face, they should be eligible to receive targeted and tailored health support to reduce health protection risks (Hester 2019; Jeal and Salisbury 2004a). However, they are often not explicitly eligible for targeted and tailored support due to a lack of information on incidence, prevalence of disease, and even more basic data such as reliable estimates of the number of sex workers in the UK. Accordingly, the aim of this paper is to determine a population size estimate, with uncertainty, that is more robust than those currently available. In this study, we apply Bayesian Evidence Synthesis to bring together historic estimation efforts with recent ONS National Population Estimates and Genito-Urinary Medicine Clinics Attendance Data (GUMCAD) from the UK Health Security Agency (UKHSA). A key feature of our model is the embedding of uncertainty from each input study in model priors, hence propagating it through to our final estimate. The Bayesian evidence synthesis model estimated a total of 84,000 sex workers in the United Kingdom (95% credible interval: 49,000-130,000), representing 0.121% of the current UK population.

2

Peri Operative deLta rEnin ConcentrATion (POLECAT) Study Protocol and Analysis Plan

Boyer, N.; Haider, S.; Piercy, C.; Zarbock, A.; Samuels, T. L.; Papadopoulou, A.; Forni, L. G.; Creagh Brown, B.

2026-05-27 intensive care and critical care medicine 10.64898/2026.05.26.26352884 medRxiv

Top 0.1%

10.1%

Show abstract

Background: Post-operative hypotension and vasoplegia are well recognised following cardiac surgery but remain poorly characterised after major non-cardiac surgery, despite associations with acute kidney injury (AKI), cardiovascular complications, and increased mortality. Dysregulation of the renin angiotensin aldosterone system (RAAS) may underpin haemodynamic instability in this setting, yet data in abdominal surgery are limited. Objectives: The POLECAT (Perioperative delta Renin) study aims to determine whether changes in circulating renin concentration (delta renin) from pre-operative baseline to the early post-operative period are associated with post-operative vasoplegia in patients undergoing major abdominal surgery requiring intensive care admission. Methods: POLECAT is a single-centre, prospective observational study conducted at a UK tertiary referral hospital. Adult patients undergoing planned or emergency abdominopelvic surgery with anticipated intensive care admission are enrolled. Blood samples are obtained pre-operatively, within four hours post-operatively, and on post-operative day one to measure renin and a panel of endothelial, renal, and immune biomarkers. The primary outcome is post-operative vasoplegia, defined as the requirement for a vasopressor infusion at 08:00 on post-operative day one. Secondary outcomes include alternative vasoplegia definitions, AKI (KDIGO criteria), vasopressor burden, organ dysfunction, cardiovascular complications, length of stay, and mortality. Multivariable regression, receiver operating characteristic analyses, and predefined subgroup analyses will be performed, with sensitivity analyses addressing missing data. Conclusions: This study will clarify the relationship between peri-operative RAAS dysfunction and vasoplegia following major abdominal surgery. Findings may support biomarker-guided risk stratification and inform future interventional trials targeting haemodynamic instability in this high-risk population.

3

The Telesafe archive: creating a database of UK primary care telephone consultations

Edwards, P. J.; Caddick, B.; Skeen, A.; Lin, J.; Ridd, M. J.; Barnes, R. K.; Salisbury, C.

2026-05-26 primary care research 10.64898/2026.05.19.26353559 medRxiv

Top 0.1%

6.9%

Show abstract

Background In 2024, one-third of GP appointments in England were conducted by telephone. What happens during these consultations is largely unknown. Aim To test the feasibility of collecting recorded GP telephone consultations with linked data and consent for future research use. Design and setting Retrospective observational study in seven practices in South West England. Method Adults who had a telephone consultation at practices that routinely record calls were invited to consent to retrieval of call audio, a 4-month electronic health record (EHR) extract and a post-consultation patient questionnaire. Practice-level consent rates were analysed using regression models. Results Of 28 clinicians recruited, 19 GPs had consultations with patients whose recordings were retrievable, usable, and consented for future research. Of 2,053 invitations, 123 patients consented (6.0%). Consent was lower in more deprived practices (IMD 1-2 vs 9-10: OR=0.22, 95CI=0.09-0.54). Of 101 recordings retrieved, 96 were usable and 91 had consent for future research. 86/91 were linked to EHRs and 89/91 to post-consultation patient questionnaires. Mean consultation duration was 7 minutes 13 seconds; audible typing was heard in 69% (63/91). 161 problems were discussed (mean 1.77 per consultation). Most patients were happy their consultation was by telephone (96/117, 82%), although the majority reported usually preferring face-to-face appointments (68/115, 59%). Conclusion It is feasible to assemble a reusable archive of GP telephone consultations with linked data. However, recruitment was low using retrospective remote consent. Future work should test alternative recruitment approaches, particularly to improve patient engagement at practices serving deprived populations.

4

Highly Efficient Lentiviral Transduction of Human iPSC-Derived Microglia and Macrophages

Goberdhan, S. C.; Czubala, M. A.; Thomas, S. E.; Taylor, P. R.; Connor-Robson, N.

2026-05-27 neuroscience 10.64898/2026.05.23.727402 medRxiv

Top 0.1%

6.6%

Show abstract

BackgroundMicroglia have become a cell type of interest in the neurodegenerative field given both genetic and pathological evidence for their role in disease development and progression. There has been a rapid growth of studies using iPSC-derived microglial models to understand the molecular mechanisms driving these neurological diseases. However, it remains difficult to transduce myeloid cells effectively which is critical when aiming to study the role of disease associated genes and pathways. Current methods require exposure to multiple viruses which is not suitable for all experimental paradigms. We have therefore sought and characterised a high efficiency promoter and plasmid design to allow high transduction efficacy with a single lentivirus. ResultsUsing the spleen focus-forming virus (SFFV) promoter in combination with central polypurine tract (cPPT) and Woodchuck hepatitis virus post-transcriptional regulatory element (WPRE) plasmid elements gave significantly higher transduction efficiency and transgene expression than was achieved with commonly used promoters CMV and EF1. This could then be further improved if required to over 90% transduction efficiency with the removal of lentivirus restriction factor SAM and HD domain-containing protein 1 (SAMHD1) by adding VPX. ConclusionsOur findings allow for a simpler, more efficient and streamlined approach to transgene expression in iPSC-derived microglia and macrophages using only a single lentivirus. This minimises potential unintended side effects such as additional cellular activation and increased cell death.

5

Why is team-based hypertension care failing to take hold in Australia? Real-world evidence from primary care

Satheesh, G.; Slater, K.; Trivedi, R.; Clapham, E.; Lopez, F. M.; McCormack, B.; Miranda, J. J.; Mishra, S. R.; Peterson, G. M.; Sarkies, M.; Schutte, A. E.; Chapman, N.

2026-05-26 primary care research 10.64898/2026.05.25.26354005 medRxiv

Top 0.1%

6.3%

Show abstract

Objective: The shortage of general practitioners (GPs) in Australia has intensified interest in team-based care for hypertension, involving pharmacists and nurses. This study explored primary care provider experiences, barriers, and facilitators related to implementing team-based care in Australia. Design: Qualitative study using semi-structured interviews with primary care providers. Methods: We conducted 51 interviews with GPs (n=24), nurses (n=12), and pharmacists (n=15), purposively selected from diverse primary care settings. Analysis combined deductive coding, informed by the Theoretical Domains Framework and Consolidated Framework for Implementation Research, with inductive thematic analysis to identify emergent themes. Results: Interviews demonstrated a predominantly GP-centred care model, with nurse and pharmacist involvement largely confined to supporting roles, including blood pressure measurement, prescription refills, patient follow-up and counselling. Their contributions were constrained by barriers at both practice (e.g., limited GP support, fragmented communication across providers) and health system levels (e.g., limited financial incentives and restricted reimbursement pathways). Despite their critical role in care planning, nurses described being hamstrung by workload and limited direct funding for hypertension-related services. Pharmacists reported unreimbursed blood pressure checks and restricted funding for medication reviews that constrained the sustainability of their hypertension services. Role ambiguity and the absence of standardised protocols on task sharing further limited collaboration, with nurses and pharmacists describing concerns about overstepping professional boundaries. Attitudes towards team-based care ranged from active disregard (outright rejection) to conditional acceptance and occasional active uptake (strong endorsement). Conclusion: Despite clear willingness among nurses and pharmacists to alleviate GP burden, team-based care is rarely implemented in routine practice. Addressing system-level barriers (funding models that incentivise team-based care and standardised treatment protocols that clarify shared workflows), alongside provider-level barriers (stronger awareness and training that normalises task sharing), is critical to support genuine team-based hypertension care in Australia.

6

The AFRIDIARRHEA multimodal fusion framework for Estimating the Burden of Diarrheal Diseases Among Children Under Five in Kenya, Zimbabwe, and Somaliland

Agumba, J. O.; Namusonge, L.; AFRIDIARRHEA CONSORTIUM, ; Ogendo, J. O.; Hassan, M. A.; Waswa, L. M.; Takavarasha, M.; Shisanya, M. S.

2026-06-02 epidemiology 10.64898/2026.06.01.26354632 medRxiv

Top 0.1%

4.9%

Show abstract

Background: Accurate estimation of childhood diarrheal disease burden in Africa remains challenging because of limited surveillance, incomplete mortality data, pathogen-attribution uncertainty, and complex environmental and socioeconomic drivers. This study developed the African Diarrheal Disease Integrated Risk Intelligence and Burden Estimation Architecture (AFRIDIARRHEA), a multimodal fusion framework for estimating under-five diarrheal burden in resource-constrained settings. Methods: AFRIDIARRHEA integrates Bayesian epidemiological modeling, machine learning, temporal forecasting, geospatial analytics, pathogen attribution, environmental intelligence, and uncertainty quantification within a unified framework. Synthetic datasets representing Kenya, Zimbabwe, and Somaliland were used to evaluate mortality, morbidity, hospitalization burden, pathogen-attributed mortality, and predictive performance. Results: The framework identified substantial heterogeneity in disease burden across countries, with Zimbabwe exhibiting the highest modeled mortality and morbidity burden and Somaliland the highest hospitalization burden. Rotavirus and Shigella were the dominant contributors to pathogen-attributed mortality. The multimodal fusion model outperformed the Bayesian baseline and individual component models, achieving improved predictive accuracy, robust uncertainty calibration, and strong agreement with benchmark estimates. Conclusions: AFRIDIARRHEA demonstrates the potential of multimodal fusion modeling for integrated estimation of childhood diarrheal burden, pathogen attribution, and uncertainty in African settings. The framework provides a scalable, transparent, and policy-relevant approach for supporting vaccine prioritization, WASH investments, outbreak preparedness, and child survival programs in data-limited environments. Keywords: Diarrheal disease, burden estimation, multimodal fusion, pathogen attribution, machine learning, uncertainty quantification, Africa

7

Optimisation of steatotic liver disease screening algorithm for resource-poor settings using machine learning

Mettananda, C.; Sivasumithran, K.; Ranaweera, L.; Madhubhashini, A.; Ranawaka, C.; Pathmeswaran, A.; Dassanayake, A.

2026-06-10 endocrinology 10.64898/2026.06.09.26355306 medRxiv

Top 0.1%

4.9%

Show abstract

Background The European Association for the Study of the Liver (ESAL) - Steatotic Liver Disease (SLD) screening algorithm involves two steps; initial screening with FIB-4 followed by referral for vibration-controlled transient elastography (VCTE) in patients likely to have significant fibrosis (SF). However, VCTE is not widely available in resource-limited settings. Aim To optimise the EASL SLD screening algorithm for resource-poor settings using machine learning (ML). Methods We analysed data from 964 adults aged [≥]35 years who underwent VCTE at a tertiary referral centre in Sri Lanka between November 2024 and 2025. Multiple ML models using different methods and variable combinations were trained on 80% of the dataset and tested on the remaining 20%. Best models were selected based on performance and externally validated using data from 430 patients who underwent VCTE before November 2024. Model performance was compared with the FIB-4 using confusion matrices. Results A Random Forest model incorporating age, AST, ALT, and platelet count separately, rather than using FIB-4, outperformed. The all-variable ML model showed the best predictive performance for SF, with accuracy of 77.2%, recall of 0.762, precision of 0.778, and AUC-ROC of 0.818. The variables used in the model, in descending order of feature importance, were AST, platelet count, BMI, ALT, age, diabetes mellitus, hypertension, dyslipidaemia, sex, family history, hypothyroidism, diabetes complication and smoking. External validation demonstrated 75.1% accuracy and an AUC of 0.779. When used as the first step of the SLD screening algorithm, the all-variable ML model identified 37 (17.1%) additional true positives and reduced false-negative diagnoses by 50% compared with FIB-4. Conclusions ML-based models were more effective than the FIB-4 score as the first-line screening tool for VCTE referral, substantially improving the identification of patients with significant fibrosis in this South Asian cohort.

8

Stage-aware transcriptomics reveals selective haplotype persistence in short-term ex vivo cultured Plasmodium vivax

Abagero, B. R.; Dumetz, F.; Ford, C. T.; Tolosa, T.; Tesefay, D.; Lukas, B.; Shenkutie, T.; Popovici, J.; Yewhalaw, D.; Serre, D.; Lo, E.

2026-05-13 cell biology 10.64898/2026.05.11.724466 medRxiv

Top 0.2%

4.0%

Show abstract

Plasmodium vivax (Pv) infections are developmentally asynchronous and often polyclonal, complicating interpretation of bulk parasite transcriptomes. Here, we analyzed paired in vivo and short-term ex vivo transcriptomes from Ethiopian clinical isolates using stage deconvolution and PvMSP1 haplotyping. Ex vivo maturation modestly increased inferred schizont representation while largely preserving the proportion of trophozoites and gametocytes. After adjustment for parasite stage composition, in vivo and ex vivo transcriptomes remained globally similar, with no genes significantly differentially expressed, indicating the absence of major culture-induced transcriptional response. In contrast, short-term culture reduced multiplicity of infection, contracted within-host haplotype diversity, and non-randomly depleted specific haplotypes, consistent with a clonal bottleneck. In a subset of low-complexity infections, residual expression patterns were clustered by dominant haplotype, suggesting genotype-associated transcriptional heterogeneity independent of developmental stage. Together, these findings indicate that short-term ex vivo culture enriches late asexual stages and selectively filters clones rather than inducing a common transcriptional program. These results shows that ex vivo cultures are reliable way to study gene expression, especially for late stages. However, these needs explicitly model developmental composition and infection complexity when interpreting Pv transcriptomes from natural infections Author summaryMalaria caused by Plasmodium vivax is difficult to study because this parasite cannot yet be grown continuously in the laboratory and infections in patients often contain parasites at different developmental stages and multiple parasite lineages at the same time. In this study, we wanted to understand how much of the parasite gene-expression signal reflects true biological differences, and how much is explained by parasite development or changes that occur during short-term laboratory maturation. We compared parasites collected directly from patients in Ethiopia with matched parasite matured briefly outside the body. We found that short-term culture mainly increased the proportion of later-stage parasites, but after accounting for developmental stage, the overall gene-expression patterns remained very similar. However, culture reduced the diversity of parasite lineages within infections, suggesting that some parasite lineages survive better than others under laboratory conditions. Our findings highlight that natural Pv infections are complex mixtures of parasite stages and lineages. Accounting for this complexity will improve how researchers interpret parasite gene-expression studies and design future studies of parasite invasion, transmission, and survival.

9

Machine-Assisted Topic Analysis of Large-Scale Health Experience Data: Identifying Sociodemographic Differences and Evaluating Bias in Large Language Models

Bondaronek, P.; Ward, E.; Beecham, E.; Zhang, E.; Huang, Y.; Ive, J.; Naughton, F.; Wu, H.; Vindrola-Padros, C.

2026-05-22 public and global health 10.64898/2026.05.20.26353755 medRxiv

Top 0.2%

4.0%

Show abstract

Introduction: Large-scale free-text data with socio-demographic information can capture nuanced accounts of lived experience that are difficult to detect in structured measures. However, manual qualitative analysis is difficult to scale, while automated approaches may obscure subgroup variation or introduce bias. This is especially relevant for large language models (LLMs), whose use in qualitative health research is increasing despite limited evaluation in socio-demographically stratified analysis. Objectives: This study examined how socio-demographic differences in health and wellbeing experiences were manifested in a large-scale free-text dataset, and evaluated how different AI-assisted analytic approaches identified these differences. Specifically, it aimed to: (1) identify socio-demographic differences using Machine-Assisted Topic Analysis (MATA); (2) compare MATA outputs with topic modelling combined with LLM-based topic interpretation; and (3) examine potential bias in LLM-based analysis. Methods: We analysed 2,177 valid free-text responses from the UK COVID-19 Wellbeing Tracker, a longitudinal survey of adults recruited during the pandemic. Responses described factors influencing health behaviours, mood, and wellbeing over time. Data were preprocessed and stratified by gender, age, and socioeconomic status (SES). MATA combined topic modelling, using Latent Dirichlet Allocation, with humanled qualitative interpretation of topic keywords and representative responses. The same topic model outputs were then interpreted using an LLM for comparison. Potential LLM bias was assessed using a demographic label-swap crossover design, with bias evaluated through Jaccard lexical similarity, VADER sentiment, and NRC emotion analysis. Grounded Review and Assessment of Computational Evidence (GRACE) was used to evaluate the AI outputs. Powered by Editorial Manager(R) and ProduXion Manager(R) from Aries Systems Corporation Results: MATA identified meaningful socio-demographic thematic differences in pandemic-related mood and wellbeing across gender, age, and SES. Common themes included disruption, adaptation, uncertainty, routine, and the influence of work, relationships, and health on wellbeing. Male-stratified topics emphasised routines, habits, and coping with external pressures, whereas female-stratified topics were more relational and reflective, focusing on connection, isolation, family wellbeing, and anxiety. Lower SES narratives included practical strain, financial pressure, and loss of control, while higher SES narratives more often reflected adjustment, autonomy, and meaning-making. Older adults described health, gratitude, and family connection, whereas younger adults emphasised work-related stress and competing demands. LLM-based interpretation broadly reproduced the high-level subgroup patterns identified through MATA, but outputs were more generalised, less conceptually differentiated, and showed greater thematic overlap. Bias analysis showed systematic shifts in vocabulary, sentiment, and emotional tone when demographic labels were swapped, suggesting a risk of representational bias. Conclusions: MATA identified meaningful socio-demographic differences while retaining interpretative depth at scale. LLM-based topic interpretation showed utility for rapid thematic summarisation, but produced less conceptually differentiated outputs and was sensitive to demographic framing. The analysis also identified "LLM speak", where outputs appeared coherent but relied on abstract, generalised, and overlapping interpretations. Human oversight, structured qualitative appraisal, and explicit bias evaluation are necessary when using LLMs to analyse socially stratified free-text health data.

10

Integrating a Non-Communicable Disease Care Cascade within Ghana's Community-Based Health Planning and Services (CHPS) Program: the COMBINE Pilot Implementation Trial

Heller, D. J.; Elkersh, Y.; Nonterah, E. A.; Kuwolamo, I.; Horowitz, C. R.; Alvarez, E. E.; Awine, T.; Govindarajulu, U.; Squires, A. P.; Aborigo, R. A.

2026-06-05 primary care research 10.64898/2026.06.03.26354834 medRxiv

Top 0.2%

4.0%

Show abstract

Introduction: Hypertension is the world's leading cause of death, and depression its leading cause of disability. Control rates for these noncommunicable diseases (NCDs) are low in low and middle-income countries (LMICs). Many LMICs have programs to screen and treat underserved communities for infectious diseases, but evidence to adapt them to treat NCDs is limited. We developed and tested a non-communicable disease program through Ghana's Community-Based Health Planning and Services (CHPS) primary care initiative. Methods: We trained 8 CHPS nurses to diagnose and treat hypertension and depression through door-to-door screening and pharmacotherapy. Physician assistants provided telehealth supervision. We combined this treatment with volunteer counseling to boost medication adherence, improve mood, and change health behaviors. We called the 90-day intervention the CHPS Opportunity for Mentally and Behaviorally Integrated NCD Engagement (COMBINE). Results: We recruited 60 adults from 580 screened: 37 with hypertension (mean blood pressure (BP) of 149/91 mm Hg) and 23 with depression (mean physician health questionnaire (PHQ-9) score of 13.3). After 90 days, 57/60 (95%) completed the intervention: 32/37 (86%) achieved blood pressure control (mean BP 122/75 mm Hg), and 19 of 20 (95%) achieved depression control (mean PHQ-9 score 2.0). After 12 months, 51/60 were retained: 33/37 with hypertension (89%) and 18/23 with depression (78%), with a mean BP of 121/75 and PHQ-9 score of 1.4 respectively. All 51 (100%) achieved disease control at 12 months. 5 persons left by migration and 4 by escalation to higher-level care. Conclusions: The COMBINE model achieved high levels of diagnosis, care retention, and disease control, with minimal adverse events, in a remote setting with limited usual NCD care. This model suggests a novel means to improve the care cascade for these and other noncommunicable diseases through existing non-physician care models in LMICs, warranting further controlled testing at scale.

11

Social prescribing for children and young people in the UK: characterising access and care pathways using electronic health records

Bone, J. K.; Bu, F.; Hayes, D.; Fancourt, D.

2026-06-03 epidemiology 10.64898/2026.06.02.26354692 medRxiv

Top 0.2%

3.9%

Show abstract

Objectives We aimed to describe the characteristics of children and young people referred to social prescribing across the UK and understand what social prescribing looks like for these young people. Additionally, we aimed to explore whether access to and experiences of social prescribing vary with age and have changed from 2017 to 2025. Overall, we aimed to identify whether social prescribing reduces or exacerbates health inequalities among children and young people, and whether this has changed over time. Design Analysis of social prescribing electronic health records Setting Social prescribing hubs and services across the UK that use Access Elemental (a cloud-based social prescribing platform) Participants 52,585 individuals referred to social prescribing in 2017-2025 aged 4-25 years (mean=20.04, SD=4.71), of whom 57% were female, 39% male, <2% were in other gender groups, and 3% did not disclose their gender Primary and secondary outcome measures We summarised the characteristics of young people and described the care pathway received. We then used regression models to test whether these factors differed by age and over time. Results Most individuals were aged 18 and over, 91% lived in urban areas and 58% lived in the top three most deprived deciles of the UK. Most were referred by GPs or other allied health workers (79%) and mental health was the leading reason for referral (44%). The typical pathway included 4.64 social prescribing contacts (SD=7.70) totalling 66 minutes (SD=108), with 34% receiving an onward referral to community support. The average age of those referred to social prescribing increased over time. Conclusions Our findings indicate that social prescribing currently has limited reach for those under 18 and this disparity may be increasing. It was promising that children and young people referred to social prescribing were more likely to live in deprived areas. However, given current findings, more work is needed to increase the reach of social prescribing for children and young people across the UK.

12

VNtyper 2 enables open-access short-read genotyping of MUC1 VNTR variants in ADTKD at high-speed

Popp, B.; Saei, H.; Teltsh, O.; Janousek, V.; Pristoupilova, A.; Vrbacka, A.; Hartmannova, H.; Kidd, K.; Helmuth, J.; Bleyer, A. J.; Wiesener, M.; Fausch, K.; Rowan, C.; Hassan, E. E.; Clince, M.; Cavalleri, G.; Locher, M.; Eckardt, K.-U.; Richter-Pechanska, P.; ADTKD-Net Consortium, ; Kmoch, S.; Antignac, C.; Conlon, P.; Dorval, G.; Zivna, M.; Halbritter, J.

2026-06-03 nephrology 10.64898/2026.05.27.26352937 medRxiv

Top 0.3%

3.7%

Show abstract

Background: ADTKD-MUC1 is one of the major entities of ADTKD caused by frameshift variants in the MUC1 VNTR that standard short-read sequencing fails to detect. Existing 59dupC-targeted probe-extension assays do not allow for broad screening and cannot detect atypical non-dupC variants. Recently, VNtyper, a Kestrel-based genotyping pipeline with optional code-adVNTR cross-validation for MUC1 VNTR genotyping from short-read sequencing data allowed to circumvent this diagnostic limitation, but needed further development for easy access and rapid sample processing. Methods: We developed VNtyper 2, by refactoring VNtyper into a modular, production-grade tool with a companion web platform, VNtyper-Online (https://vntyper.org), for freely available browser-based analysis with short turnaround time and without local bioinformatics infrastructure. We validated VNtyper 2 on 400 simulated samples generated with MucOneUp and 142 clinical exomes with independently confirmed genotypes. Results: In simulation, VNtyper 2 detected the canonical 59dupC variant with 96% sensitivity and 100% specificity. Reference-standard validation on 142 samples yielded 90.6% sensitivity and 98.2% specificity overall, with cohort-dependent performance across the Twist Exome v2 French-German cohort (98% sensitivity, 87.5% specificity) and the KAPA HyperExome V2 (Roche) Czech-US cohort (79.4% sensitivity, 100% specificity). Screening of 3582 exomes and targeted panels from international CKD referral programmes identified 51 positive individuals, including 9 with atypical non-dupC frameshift variants that would have been missed by 59dupC-targeted probe-extension assays. In unselected CKD cohorts, a descriptive random-effects summary estimated a detection rate of 1.4% (95% CI 0.6 to 3.1%). Conclusions: VNtyper 2 and VNtyper-Online are open-source tools for MUC1 VNTR genotyping from short-read data and can support locally validated workflows when VNTR coverage is adequate. By improving accessibility and turnaround time, these tools democratize MUC1 diagnostics at global scale. For its integration into routine diagnostics, we propose an expert-informed two-pathway workflow developed through European ADTKD-Net consortium consensus.

13

A wealth index based on two-component polychoric principal component analysis reduces urban bias and improves socioeconomic classification in low- and middle-income country surveys: a validation study using LSMS surveys

Vidaletti, L. P.; Dos Santos, A. M.; Hellwig, F.; Barros, A. J. D.

2026-06-08 epidemiology 10.64898/2026.06.01.26354245 medRxiv

Top 0.3%

3.7%

Show abstract

Background: The traditional wealth index, based on principal component analysis (PCA), used in the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS), suffers from urban bias, distorting estimates of health inequality. We compared the traditional index (PEAR1) with an alternative two-component polychoric PCA index (POLY2) using annual expenditure from 12 LSMS surveys as the gold standard to determine which provides more accurate SEP measures for equitable policy targeting. Methods: We compared the traditional wealth index (PEAR1) with a two-component polychoric PCA approach (POLY2) using 12 LSMS (Living Standards Measurement Study) surveys (2015-2022) from 12 African countries. Annual household consumption expenditure was the gold standard. We assessed agreement using weighted Cohen's kappa and validated against education (proportion of households with secondary or higher education) using the concentration index (CIX) and slope index of inequality (SII). Results: The POLY2 index showed higher agreement with expenditure quintiles (average national weighted kappa = 43.3%) than the PEAR1 index (35.1%), with notable improvements in urban (43.5% vs. 27.5%) and rural (35.3% vs. 22.4%) areas. POLY2 also attenuated extreme household distributions observed in PEAR1. Education validation showed that POLY2 produced intermediate inequality gradients between the flatter expenditure-based gradient and the steeper PEAR1-based gradient. Conclusion: The POLY2 wealth index is superior to the traditional index, reducing urban-rural bias and providing more accurate socioeconomic classifications. Its adoption in large-scale surveys such as DHS and MICS is recommended to improve equitable monitoring of health inequalities in low- and middle-income countries.

14

Evaluating longitudinal ecological models linking scientific production to population-level indicators: a global case study in mental health research

Acosta-Monterrosa, A. A.; Hernandez-Paez, D. A.; Visconti-Lopez, F. J.; Kalokoh, S.; Lozada-Martinez, I. D.

2026-05-15 scientific communication and education 10.64898/2026.05.09.723946 medRxiv

Top 0.3%

3.7%

Show abstract

BackgroundQuantifying the alignment between scientific production and population-level indicators remains a persistent methodological challenge in health research evaluation. While longitudinal ecological models have been increasingly used to explore associations between research output and societal outcomes, their feasibility, interpretability, and structural limitations have not been systematically examined. MethodsWe conducted a longitudinal ecological meta-research analysis integrating global bibliometric data on mental health publications with country-level indicators of mental disorders, mental health infrastructure, and subjective well-being. Analyses were stratified by World Bank income groups and implemented using a three-step framework comprising income specific linear regression models, random-effects meta-analyses, and meta-regressions to assess association patterns, heterogeneity, and potential moderators. ResultsScientific production was highly concentrated in high-income countries. Income-stratified regression models revealed divergent association patterns across contexts, with inverse associations observed in higher income groups and predominantly positive coefficients in low-income countries. Meta-analyses showed extreme between-group heterogeneity for most indicators, yielding largely attenuated pooled estimates. Only one subjective well-being indicator retained a significant pooled association. ConclusionsLongitudinal ecological models linking scientific production to population-level indicators can identify broad association patterns and structural asymmetries but are strongly constrained by contextual heterogeneity and data availability.

15

Algorithmic Versus Expert Rankings of Large Language Models in Peritoneal Dialysis Prescription Review: A Trap-Embedded Synthetic Benchmark

Wei, C.-H.; Lin, H.-J.; Lai, W.-W.; Lin, H. M.

2026-06-01 nephrology 10.64898/2026.05.28.26354383 medRxiv

Top 0.3%

3.7%

Show abstract

Background: Clinical LLM benchmarks rarely test whether algorithmic rankings agree with expert clinical judgment. We developed a trap-embedded peritoneal dialysis (PD) benchmark comparing multiple scoring constructs with blinded nephrologist ratings. Methods: We generated 125 synthetic PD cases containing 13 ISPD-aligned trap types. Five LLMs (Claude Sonnet 4.5, GPT-5.4, Gemini 3.1 Pro, DeepSeek-R1, Grok 4.1 Fast) evaluated each case three times at temperature 0 (1,875 calls). Primary outcome was must-identify TDR_must, analyzed with GEE and case-clustered bootstrap. Secondary analyses included a verbosity-sensitive alarm-burden proxy, WCS, relaxed-match scoring, WCS sensitivity analyses, and a 25-output blinded expert adequacy substudy. Must-identify kappa was 0.89 in Stage 1 and 0.92 in Stage 2. Results: Rankings were discordant. Recall ranked Claude (0.977) and GPT-5.4 (0.955) above the other models (0.86-0.90, p<0.0001). The alarm-burden proxy favored concise models (Grok 0.689; 21.6 vs 2.4 issues/case), while WCS produced a third ordering. In the expert substudy, inter-rater concordance was strong (rho 0.977), but WCS did not show a positive association with expert adequacy (rho -0.17, p=0.41). Conclusion: Clinical LLM rankings in PD prescription review depend strongly on scoring construct. Algorithmic metrics should be reported alongside blinded expert adequacy ratings and should not alone determine deployment.

16

Winter forecasting of respiratory viruses in Victoria Australia

Henderson, A. S.; Moss, R.; Adekunle, A. I.; Ye, H.; O'Hara-Wild, M.; Eales, O.; Senior, K. L.; Tobin, R.; Windecker, S. M.; golding, N.; Robinson, E.; Strachan, J.; Hyndman, R. J.; Dawson, P.; McCaw, J.; McBryde, E.; Shearer, F. M.

2026-05-21 epidemiology 10.64898/2026.05.18.26353544 medRxiv

Top 0.3%

3.6%

Show abstract

Temperate regions of the world, such as southern Australia, often experience increased health burden from respiratory pathogens during winter. The ability to forecast short-term trends in cases of these pathogens is of significant interest to public health. Across the 2024 southern hemisphere winter period, the Australia--Aotearoa Consortium for Epidemic Forecasting and Analytics (ACEFA) ran a pilot respiratory virus forecasting initiative in collaboration with the Victorian Department of Health. Each week from the 9th of May 2024 through to 12th September 2024, the consortium solicited 28-day forecasts of daily case incidence for influenza, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and respiratory syncytial virus (RSV) from multiple research groups. Four component model forecasts were contributed by three different research groups, with a fourth group utilising the component forecasts to generate ensemble forecasts (making a total of six models, four component models and two ensembles). Here we statistically evaluated the performance of each forecast and a baseline model against the observed case data. The two ensemble models were found to be frequently the top performing models. All models performed worse than the baseline model around the epidemic peaks for each pathogen.

17

A novel method to select Reference Proteomes in UniProt

Raposo, P.; Martinez Marin, J. S.; Kim, G.; Insana, G.; Jyothi, D.; Luo, J.; Tunstall, T.; Consortium, U.; Orchard, S.; Steinegger, M.; Martin, M.

2026-05-14 bioinformatics 10.64898/2026.05.12.720148 medRxiv

Top 0.3%

3.6%

Show abstract

MotivationThe ongoing revolution in genome sequencing is delivering an unprecedented number of genome assemblies to global repositories, resulting in an overwhelming amount of data imported to UniProt in the form of proteomes. To manage this growth sustainably, there is a need for a systematic workflow to select the best proteomes. ResultsWe propose a novel pipeline for cellular organisms to select the best Reference Proteomes, i.e. those that best represent the protein space of a species. The pipeline uses a clustering algorithm based on MMseqs2 to select the minimum number of Reference Proteomes whilst maximising the representation of the protein space for each species. Additionally, we aligned our viral Reference Proteomes with the exemplar genome set defined by the International Committee on Taxonomy of Viruses. Because this method ensures that all species are represented with at least one Reference Proteome, the UniProt Knowledgebase increased the number of Reference Proteomes of 36% and covering 34% more species in the Tree of Life. The UniProt Knowledgebase will mainly retain proteins from Reference Proteomes and therefore this method reduces the overall number of proteins by 43%, leading to a more concise yet representative knowledgebase. Availability and Implementationhttps://www.uniprot.org/proteomes Contactraposo@ebi.ac.uk Supplementary informationSupplementary data are available at Bioinformatics online.

18

Methodological Evaluation and Data Resource for Andes Virus Sequencing Preparedness

Doherty, R.; Lewandowski, K.; Fenwick, A.; Everall, I.; Morley, D.; Hartman, H.; Staplehurst, S.; Kent, C.; Loman, N. J.; Quick, J.; Pullan, S. T.

2026-05-16 genomics 10.64898/2026.05.15.725146 medRxiv

Top 0.3%

3.6%

Show abstract

As part of preparedness activities supporting pathogens classified under the UK High Consequence Infectious Diseases (HCID) framework, we previously evaluated both a whole-genome tiling amplicon sequencing scheme and a pan-viral hybridisation capture approach (TWIST-CVRP) for sequencing Andes virus (ANDV). In light of the recent outbreak, we make available viral sequencing datasets generated using a historical ANDV isolate (Chile, 1997). In addition, we provide an evaluation of tiling amplicon scheme performance and present recommended primer updates informed by in silico comparison with the recently released outbreak genome. These datasets are intended to support benchmarking, validation, and optimisation of bioinformatic pipelines across the community.

19

Intention of UK residents to wear facemasks and practise social distancing during the next respiratory virus pandemic

Smith, D. R.; Buckell, J.; Hancock, T. O.; Morrell, L.; Pouwels, K.

2026-05-30 public and global health 10.64898/2026.05.21.26353824 medRxiv

Top 0.3%

3.6%

Show abstract

Background: Wearing facemasks and practising social distancing slow the spread of respiratory pathogens. However, in the event of a new pandemic emerging, the willingness of populations to voluntarily adopt these behaviours is unclear. Methods: A discrete choice experiment was conducted among 2,006 UK-based adults. Participants were presented with hypothetical scenarios describing the emergence of a respiratory virus pandemic and were asked to choose when they would wear facemasks and practise social distancing. A mixed multinomial logit model was used to jointly estimate how disease severity and prevalence, uncertainty in these quantities, and individual-level characteristics influence behavioural choices. Findings: Participants were averse to facemasks and social distancing in the absence of pandemic risk. For each ten-unit increase in severity (10 additional hospitalisations/1,000 infections), the odds of always wearing a facemask outside the home increased by 15.9% (95%CI: 14.3%, 17.5%), relative to rarely/never, and the odds of avoiding all people as much as possible increased by 16.4% (14.6%, 18.2%), relative to not avoiding anyone. Greater disease prevalence, uncertainty in disease severity or disease prevalence, a university education, prior COVID-19 vaccination and non-white ethnicity were also associated with choosing to always wear facemasks and avoid all people as much as possible. The probability of participants choosing to rarely/never wear facemasks varied from 13.4% (11.9%, 14.9%) in the lowest-risk scenario to 1.4% (1.2%, 1.7%) in the highest-risk scenario. Interpretation: Perceived risks of disease and associated uncertainty drive intention of UK adults to adapt their behaviour in a future pandemic.

20

A protocol for the TRACS-Liverpool study, tracking transmission of extended-spectrum beta-lactamase producing Enterobacterales across health and social care settings in the United Kingdom

Gallichan, S.; Lewis, J. M.; Forrest, S.; Moore, M.; Picton-Barlow, E.; McKeown, C.; Jewell, C. P.; Todd, S.; Graf, F. E.; Feasey, N. A.

2026-05-15 infectious diseases 10.64898/2026.05.13.26352872 medRxiv

Top 0.3%

3.6%

Show abstract

Background: Antimicrobial resistance (AMR) is a global public health problem. Infections caused by extended-spectrum beta-lactamase (ESBL) and carbapenemase (CP) -producing Enterobacterales (E) threaten individuals and healthcare systems worldwide. Symptomatic infection caused by Enterobacterales is typically preceded by asymptomatic colonisation and often occurs in the most vulnerable individuals, thus interrupting asymptomatic transmission is desirable. The dominant transmission routes across the healthcare continuum including hospitals, intermediate care, and long-term care facilities are not well understood. Methods: Here we present a protocol describing a genomic surveillance framework developed for the Tracking Antimicrobial Resistance Across Care Settings (TRACS) Liverpool programme, which aims to identify critical ESBL-E transmission points in hospitals and care homes in Liverpool, UK. Our study integrates individual participant and healthcare facility data, validated standard operating procedures for taking and culturing stool, rectal, environmental, and staff samples, and genomic sequencing of ESBL-E, and statistical modelling approaches into a research framework for ESBL-E genomic surveillance. Discussion: There is a need for improved epidemiological and laboratory approaches to studying bacterial transmission. Drug-resistant enteric bacteria are a highly tractable marker of the movement of all enteric bacteria, and interventions designed to interrupt transmission of drug-resistant bacteria are expected to have a broader healthcare impact. This protocol provides a standardised, reproducible approach for identifying ESBL-E, tracking acquisition events, and linking clinical and environmental isolates through whole-genome sequencing.